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X^^4 5 4-C. y*-^©*asj F KM 1/ "£■"£■© 
B. *©<MHft»c*W4»3Lr.9h©wa-bWW 

[0072] V*f ASS, 7 ^4 5 6 -C, Z<D*)l 

■e© ffr-zrtD&m? — ?&&tz. JMfrWfc 
a. 

[0 07 3] TTJ - 31-'/ F 
TNJ - l l- » F 

[0 07 4] C©«^{CB. 5ai»^«33i» 

ffe©7 T i> *^BJ©^^K*5 | '^ 
, [0 0 75] Xf^'45 8t. fBffi*»rr-^X© 

zmm ur. s&^S6^iJ©^-c.©«»®S(cRl urs 
*n-jb«Hi*J«f'r*Ci*>"C**>. www* 

(4B*x*»?i/t. gT5e©a«fltg«:*j^-c©*sa 
^ - mmz^tf t ^ c t 

[ 0 0 7 6 ] H 1 4 B. WM)ai-» F i'rt'-T'KRI 1/ 

tswt/iH ojeattf 9 - ^ + - f r 

o •& ffl»©W-^B. #t&-5fc#Mi^M->tEg 

^^j-^oiiett^jifCiji^tcB (Wfe. 

[0 07 7] 97*5 0 2-C. S/Xf Att. 
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(12) 



[0078] 3CK. Z??7504-Q> 

t O 0 7 9] ga^ „ h WH9*—7 
Htm**-?) (crabT-oOiUn-A^^ifcfc 
». Xt^5 o e-c. iS«a§*iA*$ns. *(c io 

x«. RftttHfeft. -0*. flHHBMH©i> -f >F$ 
[0 08 0] Xf^5 10T, fiiCD^^ 2 o 

* 7'5 0 6 (CM!) . %©ttSftWcH VXiZ&*L m 
[0 0 8 1 ] ® l 5tt % -o©i - K^-yicWt 

#6U Sax ,.|. $ fcfcga-^jgft t,fc 
[008 2] raj -- 3 i-, f 

[ 0 0 8 3 ] C ©*§££«, 4 -3©We*»3-A© -5 

[0084] mmwx -e©a - , h >/,\,-7 mL 



0-2 72 000 
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©-wibr. (-o©ax v bifi^mmmm 

U ffc©ax, ^8©&S£P?c^t-«t5&i§£{cj§ 

u *© 5 «o-o«t«nuss«iRf fcor *sw 
smawwsfens. «©wMuntcnt/ctt m 

[0085] *r ? 7*6 0 67?. VWmi&mZtlZ 

[ 0 0 8 6 ] H 1 6 tt. *«3-A*Ba*||ffrSfc» 
_H6i/fcj:-$cc. SHcfeBiSgKcbi,* 
*EW**»*C4*:J:9, **9a>-7omSmft> 

fitg©-^©TOK;*tLrtt8*a»fig. 

[ 0 0 8 7 ] Z t y y'7 o 2 T\ ^f-Att, «!£:xi 

t-^l&Sfig©*^^^^^^^^^^^^ 
[ 0 0 8 8 ] 4>fc < i 2 ^©#JSie ? ij*ii % , TO* 

*«&r*.-ccr. *»3-*t#jBsw©»i&r5: 
[0089] TM^uit; m8mtatt-,xm* 

»K. «T{cjB#ftfia(*p{f*. c©0}-ctt. «T©J: 
5 1 o©«?i 7-4 2 0(DK§ 



[0090] 



[009 1] 



[0092] 
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ACGGATGAGATACGA 
ACTGATGAGATACGA 

ACTGATGAGATACGA 
ACTGATGAGATACGA 
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[0093] TlttSIWteMI* 1 . IBrStitOiWI 

2 -?©#l$E? | JKfc<,>"T:g& o X i^fcKftS*^*. 
cne>#£©&*fts©*"c< (wta-jwc^sti 

^wjWocc. p-<j>m**-ix\xm&&m\ 20 

KftS. 

[0094] ft***-* 1 ©**t£f#£*^-W>© 

k. ■c©V*--/*W.>&*i*. -tK©1B#&0<». s 
S3-;I/8±?C©..^^-^0— "^**« ^^^^ cc 

-c. -*RftciB. ££&§£&?©*#£**£< 

ft-. .#JBf^Ji;tB9^i<MBIO^«#«cJ:.»), 
[00?5]X>> 7*7 0 4 -C. ?'©**]£ 

[0 09 6] xt-^os-c, f^^-^©^fn 
*>>©*tt£»£#iH&-c* t* ^ * 

■Cft V *itf. 2 JjLh©f^ - 7*W C^Wft*** 
fcocifc&S. t ©<fcVj:if £KB. *f-»77 12 

3Li7h*»*tHC*»*»WlH-r-*. *>9— 3©* 
;U-^©J.^-5 h ©»»3 I/O > 
B. |SJCJSa*»*tfffl*tt§^-^*^^ aMf 
toft*. ;Btta#j6*»BC»£KM©**-^**K 



AC GGATGAGATACGT 
ACTGATGAGATACGA 

rsj)5ffittc*i«:iR3esn4fe© - ct*«t.< 



[ 0 0 9 7 ] 0 1 7 B. --5* 5 W*«i5t©* v ^K*J 

2 B. A-if-Kgl* «rf****5W"**IM>* * ,j " 

«*J^3nS. CO«ri*. PRT4 4 0A-C. Ctl 
B. H I V«?-f ;I/*©7>*H2>*M$ (7"af7-H2 
XKBHR) -CAS. «HKW«. a*. tftflsKWiJt 

as, cttKH5£$ftS*>©'CB&t, s . 

[0 09 8 ] X*»->«t«8 0 6KB. ^P-^EW 
(7M ) kW*«$ ©5? * Vt? FEJWSt^Sn 
S. ? F©&»4H* S * f 0 6 

[0099]** - >m& 0 8 K.\t.mmmo 

1-7*4* ..CHPOttWPtJwSfc*) K 

7-r-f^KB (0f*B. • CMPfl>tJ»f-t^3ti6> 
KB. ««©*»:/*&AaStifc^-** , ** , " l *» 

40 ->»#8 0 8 K^n^n*. 

[0 100] f»^7-f*te<fcC«l67T-f*HA 
•r.S<fc'5iCb'CfeJ:^. *f'J->«*8 1 OKI*. * 

fg*T*©^(*S29iJ©«f** 5 «^Sti*. ii^- a-lf- 
#f§ mK^(*ie?>J*Siai-C Sii^W. 1^(*iB5»J 
©«»ib-C«R3n*. 1 2 KB. 

^©^i'u^-^FE^^s^s- 

8 1 2K*sW5^i'l'*^F©^SfiSI*. **"J- 
50 >«tt8 0 6±Kafi*3ftfcfc©iHI»?A4. ^ 
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t0 10 1] @ i7«. ^ij-lf,^ M(DSi 
ft^JCLfcO (^KLfcO) . 

[-0 1 0 2 J o 8 

l 8©wr». hi otcn^,»rw»'o^ ritK«^< 

**gMf*C 4 WUf; t©W-Ctt. Jt© 

HJM ( r Ratio J ) *51. 2{CSJt3tirc»5*J a 

[ 0 1 0 3 ] m 1 8 (C^-T i 5 (c, a— tf-*jg^ t ^ 

ti**mB*4 4 0 - 2 Att; H 1 8 ¥tiJ£A3nr 
CO ** 3 -^*^»©*lfc9>6»6#lfc«>©-c*5 > c 

tifitotoz. cc-c, aft©*****/^*^-.*- 

*»^C«Tr4to-c*J:u. Wt>*>**4; Hl7fc 
JOTS 1 8 r4fWBn 4 4 0 - 2 A 4 bX*Htitx * 

©T&3. Hi 8«C7jrf,£5«:. «f-ii % >tl-e*l 

[0 104]H17{C5WJ:5K. fc#E*j440-2 
A©«»©|}(ct>;*ar >/->7-C3>©+v-i.*J^ 



(14) 
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40 
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*rt3C4#-c*3. <*fc t H 1 8 (cijVf J; <5 «c g£ 

10 1 0 5 J i£iga©!£ 
Hl9tt. ffi£W©^^f^7Q-74^^7-a_7 

©;> 9 F«aatt*ttiw« ciw») ise^© 

•oras hfciawtff-r s ^ n - yzmrz. m& 
m (tv a) «c*ws**te (mm) ^jatctt, -en 

^C3fje©gflf>K«({c*fUr^^{cffiafie,«:3tiS 
20 f*5S±*te (PM) ^n-7-dsffS^s o 

C0106] &ft$iii*£l>»? 7*affiK:as t/< fj 

?£tf**te7-p-7-©/W T"!/ * YmSfSgUfiK®.* 

o(c^6 omh<omu-,itm.y-u-v'ifi%%tih& 
» ^*c©«r^f oft < nil., mm 

J: 5*. *IBB©|5ll*>6jaSi^ « C 4 ft < t hh<0 
[0 10 7] -i^UKC. SWiBrlj (**(,>«££*) 

ft< 4fe-o©5?i.u^^ K^*f^*J#stiS. fK, 
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[oio8] ^^f^7D-7©g§ S j, ®m 

VIZtiZ. «itf. ^Kffi?»J^r20Mtt40rtJ: 

[o 1 09] ±^Lfc«fc^{c, mm&M. mzti> 



(15) 
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^j, 7a-7K5>J. *»±-c©^a-^{4g^. ft 

[0 1 1 0] Bll 9K5VTJ:.9«:. M^'J^FKlS 
& v Xf"? 79 0 2-0. 3>e»->VXf-A(C, Stfc 
WB%^&V--?tXtt&?™-'*<Dn<<1*) v F 10 

ffM5SS* 5 A^3ni». -v fjwssmb*. «« 

< i *>-o©* * H-^Ffl^Ste^o-^fclltr-a 
[0 1 1 1 ] Xf-^9 0 4t, 3>b-*-*^Xf-A 

,55. ^©^a^n^i**^-^^ 
y * F»fl»«*tttW"*. ifiWWSl/'COSJB 
KB. FJ&fiSSSK 2( 

-Htttc. *M>^0-^©'W^'J» F»«SrK 

#*i=>fts. fit. -^©^a-^ica-^-ciPJjSfctT 

fcfe-?#&3iLTO*fr£5:fe©^* J ft*>*' 1 *' 
#©7a-~.7©>W7»J v F^RiE«S ; &It^-r-&A<*W 
ft#a©i»WL-tB. 0 t#iltHI«. 

[0 1 1-21 tXttG-fV-?®"* 

7«; vmmmatMt^^ *f ?79 0 6-c. a 3 

OS <#Rt/tlr>S) . «ft6iil»«l>. 

«. #&l/-CC>ttl» (ISBl/C^&lO Ci^T^ 

[o i i 3] H2 0 wawitii^'catBWWi 

N*t©5S£*t£:7a - 7i**t£7o - 7<cWf S£© 

v y mmmt. £«±?7a-7{ow7<; ? m 

®Xb?>. HT. S^^o-^W^'Jj f#« 
gig* I pm-CmU **f£7a-7©;W7') » FtW 

SigE£ I imfCS^. 

[0 114] XT' -;7*9 5 4-C. -*f©^a-7©>W 
IV v FfcritB*****"*"*. XO^v 79 5 6-C. 
a > FflMWtt*. *©»©&" 4 7 <J ? F 

,< , ^ y ^ h, > F %g K «fc 9 K L-t J: i». 
[0 1 1 51 7a-7*t©/W7V 
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y FffOnK*lt£llfii <D) teJ:^©^ (R) t 

iti&tz. BP^. 7a-7*t©>'W7'j 9 Fj^aasK© 
n ( i m- 1 m) #*5WfteLh-c***«*. a 

o, 7a- 7ft©-' W 7 'J v F^fiS*S©lt ( I pm/ 1 

e>©igffi«. as. -oa*c»»«»©»&H>88M* 

*S. W*.tt\ «&M§*2 0Ktt©H«*l. 2KS 

[0 116] Ipm- lirni> = D. 51*. l../I..>- 
R©J§£tCB. ^9 79 6 0-CNPOSftliWlStt 

5. nposb. ae?*«»Br*"i*tt* s flR'»ti* 
?®v&z. nttomw&omK. npo 

[0 1 1 7] Xf- ^79 6 2-CB. I mm- I pm> = DR. 

I mo/ 1 pm> = R*lfiiffi-r?»*>i' i 5*>©««* i ^5^ 
S. C©a#*0St-3»^K«. Xf-^79 64"CNN 
EG*llS»lStii. NNEGB. 

7a-7*f©^=S:^-r<ii-C*2). NPOS4lHl««c s N 

[0 1 1 8] «£^#^btl>£C££7r;-*\ 
«. »«L-CC»ftl»Ci*wfr'W^y » FJKl^SK* 
W^SS^ci-^CCHU-C. Xf-y79 6 6"C. tb© 
M^W (LR) ( I D 1 F) =fetW-r*. L 

Rtt. 7a-7*f©^^^V5' F^fiS«S©tt ( Ipm/ 
Inn.) ©M«*i4CtK«fcO*«>enS. ID 
I F(i> 7a-7>lst©^W7y 9 F^fiX^S©M ( I pm 
i -lmn) ^4SC<k{Cj;0*»?>n*. Xt^7'9 6 8 
■C^©^a -7*f©^ -T ^ 'J » FJBftSWEWWEl - .* t 

[0 119] 79 7 2T. iWEtiafeffllr**. fl 

B. N. NPOS. NNEGv LR «©LR) ©tt 

[0 120] PI = NPOS/NNEG 
P2 = NPOS/N 
0 P3 = (1 0KSUM (LR) ) / (NPOS + N 
NEG) 

[0121] cnsopflteflH*^ m&=f-immhxi 

[0122] «W©fc»K, P**«.» < ->*0«BK» 
«*■&. Pl*»2. l«J:"C**l«. a*s*-c*^. p 

1*52. ljfinrci. 8KJhr**itf. B*5«-c*S. 
•5-n«^©Ji^«:B. CaMtC**. BP^> Pltt - 
A. B. C©3-o©iBH«:^«i|3nS. CtxB. IMH© 

50 ©i^iCL-C. ±T©Pfi*WT©«fc^«CO <o*>©« 



29 

[0 1 23] A= (Pl> = 2. 1) 

B= (2. 1 >P 1 >= i . 8 ) 
C= (PK1. 8) 

[0 124] X= (P2>=0. 35) 
Y= (0. 35>P2>=0. 20) 
Z= (P2<0. 20) 

[ 0 12 5] Q= (P3>=1. 5) 
R= (1. 5>P3>= 1. l ) 
S= (P3<1. 1) 

( o 1 2 6 ] ±m<D7-)\,mm^T pmzmmom® 



(16) 



1 0-272000 



30 



10 



* [0 1 2 7 ] CCT. ae^D^^Stt. ^ (s 
St) . 4'SS> 4 «WLJati») ©t> 

m*-C^$n*. 5$ U and (X or Y) and (Q or R) 

?*l*. sl^i^i. Pl>=2. 1. P2> = 0. 
2 0 . P 3 > = 1 . l *Jg£0 £^J8£{c«. itfc**s|£ 
^LTC^i^^^. *fc. S:r Ba ndXandQ j 

'So 

10 12 8] u±<oist®m^-c, &&*vMmvm 

[0129] 



[0 130] 



[0131] 



4'5 64 



A and (X or Y) and (Q or R) 
B and X and Q 

A and X and S 
B and X and R 
B and Y and (Q or R) 



[0132] Z??79 7 4V, a-yf-ccfcJ^i^ 
ZtlZ. fffr&P. 4'%64«>l,>*?r<,>*M 

[oiss^-co^a-^^MaL-c^ aewg 

W3&*m7r;Ltc&> x^^yg 7 5r. LRfciofe 
bfcflO^ffifcttg?-^ 26ft. NPOSSC/NN 
EG^liJDSif/cT'n-^tcBB-r^ I D I Fffi©^* 

use 4 ttc. ttizomzm^r com 
^mmmcy^mtiim^^^x^K 30 

[0 134 J -,79 7 6 r££Wfcfj;e*fT5„ - 

* <c©*i£, x^»^9 7ortt»snfcii 

[0135] SMteWWcrSfci&ft:, g, 2 0TK--5 
©ae^KMr^^S/c^^^tfc*!. 

*ti.sas!i;©jie^©^i ffi { CC ©^a^ji^-rs c 4 
-es**. se-dt. -- ^oae^o^ffcgg-rs^tt. 

■ffc©?«&(,>. 

[0 136]H2 1tt. ae^BBEly^h^xr© 

m *m-r. ^i"J->f^^i/^iooott, 
?*x$bFmwi oozLT-tm&m 004 so 



40 



C ©&&££>-£) 
©2 o©g|5»{c»fJ$ ft*. XSoHMK^ 
{ 5 a ~^-#^-*£«f^fSS&W4&*-?77#* 

[0 1 3 7] «r©^*y->^x^u-«ow{c^f-. 

«. * safc?{cwr *-o©^$, s t > bis*©*!* 

/w^-y 5. F<^$tifc??7-©-o©^^- 
yy r . ^ < ©jte^tcK-r 5^ < © rn^j 

j 0 1 3 s ] @ 2 2 ». 3*3 tifcae^ojjwjg^ 
[0^39] t- *mwtsm* m&vmmom v 

A 5 */r;3*l-CVS„ r Experiment Name J tt. -e©H® 
IWLTa-if-jijU^ja,^^ rCeneN 

amej tt. *©iie^«0«**»|fer*. TFtositivej 

r Negative! ©j»». S2 OTIft^bfcNPOSS 
VNNEG©ffl^-r„ r pairs J «. ^©jfeT©« 



(17) 



31 



p-^M©$t£^1". *fc. Tpos FractionJ B; 2(e 

a- (Bpi,, positive/ Pairs ) £7js"3\ 
[0140 ] r A vq Ratio J «. : ® 

v j WB. log (I pa/, I, mm) ©WfcWl*. Ffm 

(m Excess j flB. 3L-!f-*««ei/fcW«J:9** 

■T 02 3K*Sft* Tpos/Neq J «B. 1 Positive] 
ffli r Negative] «©&**? < f Negative] «**feO. 

f j m\t *©ae*KWir*¥$»K£*^"*"- w 

& (BP^. (ID1F)W). 
[014M lAbs CallJ **. *©*»«»**«£ 
^»!l3-*45Vr. C©«B©«IP. M. At*. *tl* 

[0142] »-lMW«*«W**- ^ 7 ' 9 

10 3.4 

4Hv>-C.' ?2*^w : * J * ;sn *' < * - 
ofc%w*£&©^7*'St ,? -e#*. s&k. 3fs© 

[ o 1 4 3 ] a 2 4 wu ***€&*©*«»** 
^sij©x^'j->^-<^^i"'^' r »_ xe7,; "" >f: 30 

KB StaSttBKfeW-S^^^ 0 -^ 4 ^^^ 
0-7©>W^'J? K^J&5$^©Jt©^7#^3*i 
TO*. x*B*»ttB*. y«»> /W7-U»F*« 

DWv hSn-C^S* 5 . -COW-CB. HISB1. 2-C 
fei>o a-tf-B. C©W74»W. Sii<fc0±R 
CTF©^o-^©* ( I pm/ 1 ») SStrtSCiAi 

40 

I o l 4 4 ] 0 2 5 B. SftS titeWW)»e* K «T 

•T 160KB. 

1 0 6 2 fc^-*«5HW*l 1 6 4#£* 

w-cb stR* n/c«ft©se^©^* 5 Bo • o t 
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[ 0 1 4 5 ] S 2 6 B, StRS hfcWH>«BTfc|lT 

**iw»©H«**rt"JM«>* jry-vf^x/n* 

a«R«M 12 0 2 KB. ^- *«ft«W 1 2 0 4^S 

* »58ft^J6WCB. JSBU^B. ^ 
■cksnS (H2 0© (ID IF) C<D 

[ 0 1 4 6 1 m 2 7 B, WK3 Mctimmk+K-Wt 
-D^©^7©^-c«^-Ti. SP,K3"J©x*'J->r 
Wi*aHl 1 2 3 2 KB. 9 s - 

KBi-sapi'**** 7 12 3 6. wswafe^a^ 

1 2 3 8. Rtf . >W *'J * FiMWW 71240 

[014710 2 8 RtfH 2 9 B. »agj£3£r-.* 

jt i 4Jt«-f 5 C i CC J: D . «e^©^« 

(»») J B. ^^i(.-Cffl«-^^^*>®' C$, ^ ,Ct 

[ 0 1 4 8 3 * ? v 1 1 3.0 2 -C. ..=• > f » - * 
AK 8^e»ff*=»tifcNM©^^^a-^iT>*f 

■c*u. isi c < a»*6»e»nfc^»^a-^ -( 
^r«j 9 p^isafiK« i ™t?^-r. *f 1 3 o 4-c. 

[0 149]-*. W^130«t. 

tt©^tf£?B-7i*tt^a-^©£©^f- 

ftJ-ldW-. Xf-^1 308^. ->>F 
©{i^ta ; Sr^©**^*^- a, ®'" f ^ ,J 9 hME 

[0150]>XK. Xf^l3 10-C. I-J«©« 
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©wj m®mzmm® (dd i d ^ 
tt<omm (rd i FYtttmz. Mm™. *>m 
( j pin- j m ) <o» ^'j, mmsit < i pm 

I bit.) ©A^.j , h-mm<D£i)i£ft®m&itV 

[0152] (Jpm-Jntn)- ( I pm- I mm) > = D D 
IF R-o. (Jpm-Jn»,)/(lp n ,- Inro) > = RD 
I FV&tfLtZmiCK. Xr-v fi 3 1 4TN I NC 

zimasvz. -mc, nincb. mmktt<D7u 

r0 153]*f^i3 16T?. (Jpn-j™) - 
( I pm- I itm) > = DD I F, lo, ( j ^ j m) / 
( I pm- I m) >= R D I Fifiti&tZfrt^frcWfe 
C-<MW«5'S^ B . NDEC*1 
Hatt, Nb E ;c«. ^fto^n-^ 

ojD-^{c ML/ -c vN p OS NNEG Rt ;LRm 

MLXmttLtc. ^^©ffiog^Bast^JE* 

yx, zvmmmmvbofrmmmbvfr* 
I6';i5 5]%tc. «iattH2 9{c^or. xf^i 



10 



20 
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(18) ^¥1 0-272000 

©?**. InftXItMrtt. 4Hn W{CHl/t 
■2 0©^^9724974*Wfr*CiKJ: 

[o 15 6] ^f^i3 2 6-c«. mm%m 
tor, z^mm<»m^m<mm^. com 

ShWCtt. ±fB©^{c|tgsnfcfflN. NPOS 
B, NPOSE, NNEGB, NNEGE. NINC 

N I NC#NDEC^r*S*tf 9j Wcj:->T 

t0 157]NINC>=NDEC©»^(Ctt. ^4, 

[0 15 8] P1=NINC/NDEC 
P2=NINC/N 

P3= ( (NPOSE-NPOSB) - (NNEGE- 
NNEGB))/N 

P4=l 0*SUM (LRE-LRB) /N 
10 15 9] Ctit><DPmzm>X. 2o©*|«:ffl(nig 

[0161] A= (Pl>= 2 . 8) 
B= (2. 8>P1> = 2. 0) 
C= (PK2. 0) 
[0 1 62] X= (P2>=0. 34) 
Y= (p. 34>P2>=0. 24) 
Z= (P2<0. 24) 
[0 163] M - (P3> = 0. 20) 
N= (0. 20>P3>=0. 12) 
0= (P3<0. 12) 
[0 164] Q= (P 4 > = 0. 9) 
R= (0. 9>P4> = 0 . 5) 
S= (P4<0; 5) 

[0 166] Cfl^, NlNO = ND-ECr*4fc 
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[0 16 7] 

A and (X or Y) and (Q or R) and (M or N or 0) 
A and (X or Y) and (Q or R or S) and (M or N) 
B and (X or Y) and (Q or R) and (M or N) 
A and X and C Q or R or S) and (M or N or 0) 
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1 Utte of Invention 

COMPUTER-AIDED TECHNIQUES FOB AMatwtv, 
SEQUENCES ANALYZING BIOLOGICAL 

2 Claims 

- ~* --rearer of ° -* 
-»b aS ep Ml r""° 6 " ,, * i " 8kb ~ ta,, " s • — 

3> The method of daim 1 fn.*k« 
d ^"8»^ico„ whichwlttn J^\^^ c ° m P'"">g the step of 
calls atead, baseposiaonto bedisplay^ ^ ^' Polity of ba SG 

^ The method of claim i r. . . 
, 5 ^ method of claim 1 

displaying the plurality of base calls at each h ^ <*>P „f 

^ing to basepcsiL. ^ ^ *• W ceil, 

6- The method of claim c: <■ 

«y probe and the Mmpte nucIak ^ 
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wherein each base call is determined by an analysis of the hybridization intensities. 

7. In a computer system, a method of calling an unknown base in 

a sample nucleic acid sequence, the method comprising the steps of: 

a sample nucte^ *q ^ for , of sets of nucleic acld 

probes, each hybridization intensity indicating a hybridization affinity between a 
nucleic add probe and the sample nucleic acid sequence; . ^ 

"■' computingabasecaUfortheunknownbaseforeachsetofprob^;^ 
computing a single base call for the plurality of seW probes 
according to the base call for the unknown base which occurs most often for the 
plurality of sets of probes. 

g. The method of claim 7, wherein each set of probes was 
generated according to a same reference sequence. 

9 The method of claim 7, further comprising the step of checking 
exception rules that specify the single base call for the plurality of sets of nucleic 
acid probes under certain conditions. 

10 in a computer system, a method of dynamically changing 
parameters for a computer-implemented base calling procedure, the method 
comprising thesteps^ ^ ^ ^ ^ a ^ a sample dd 
^uenceutilLg the basecauing procedure, the base calling procedure includ.ng a 
parameter that is changeable by a user; . 

displaying the base calls for the at least a portion of a sample nuclerc 

acid sequence; 

displayingtheparameterof the base calling procedure; 

^ ^ ^ ft portion « m? * 

nucleic acid sequence uUing the base calling procedure and the new value for the 
parameter; and ^ ^ ^ ^ ^ fa ^ ^ ^ , ^ rf . sample 

nucleic acid sequence. 

11 The method of claim 10, further comprising the step of 
dfap ,ayin g aplu^ 
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12 ' method of data 10 wh^ m „ ^ 

aad sequence; probes and ** ^mple nucleic 

^uence; and ^ eXpreSS1 °" caU ° f «* »«nple nudeie acid 

displaying fte gene expression call. 

15. The method of claim ia *i_ 
acid sequence; prooes and the sample nucleic 
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probes; and ^ ^ exprejjion caU of ^ ^fe nucleic add sequence. 

19 The method of claim 18, further comprising the step of 
comparing a difference between, hybridization intensities of perfect match and 
mismatch probes at a base position to a difference threshold. 

20 The method of claim 18, further comprising the step of 
comparing a quotient of hybridization intensities of perfect match and mismatch 
probes at a base position to a ratio threshold. 

21. The method of claim 18, further comprising the step of utilizing 
a decisinon matrix to determine the gene expression call. ^ 

22. The method of claim 18, wherein the gene expression call is 
selected from the group consisting of expressed, marginal, and absent. 

23 In a computer system, a method of monitoring change in 
expression of a gene in a sample nucleic acid sequence, the method comprising the 

steps of: . a pluraUty of nybridization intensities of pairs of 'perfect 

mate h and mismatch probes, the perfect match probes being perfect* 
complementary to the gene and the mismatch probes having at least one base 
ntismatch with the gene, and the hybridization ^^^^ 
infinity between .he perfect mateh and mismatch probes and the sample nucleic 

acid s ^ UenCe ^ m ^ hybridization ^tensities of each pair of perfect mateh 
pxobes in order to generate a gene expression level of the sample nucleic acid 

determining a change in expression by comparing the gene expression 
level to a baseline gene expression leveb and 

displ^gthe change in expression of the gene in the sample nucleic 

acid. 

24. The method of claim 23, wherein the change in expression is 
displayed as a graph. 

25 The method of claim 23, further comprising the step of 
generating the baseline express.on level according to the inputting and comparing 
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steps of claim 23. 

Pnfect match a „ d mfc-aH, ,. J' VtadiabTO intensities rf 

nucleic acid. expression of the gen e in the sample 



steps of: ^ente, uie method comprising the 

complementary tb the gene and tnTJ^l ^ ? being perfect,,, 

Mismatch wi* the " * " 1 \ * « T ^ ^ at *■* °<« 
infinity betweehm! X ^ ^ h < vbridiz *«- 
add sequence; ^ " ^ ^ <" d *e sampJe nucleic 

Probes in o^Z^l^ T " ^ «~ * -* 
sequence; and ? '"T^* 011 feveI of ■? —I* nucleic acid 

^toahaJn^T^t™ 
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steps of claim 30. 

32. The method of claim 30, further comprising the step of 
comparing hybridization intensities of perfect match and mismatch probes 
hybridizing with the sample nucleic add sequence and hybridization mtensmes of 
perfect match and mismatch probes hybridizing with a baseline sequence to a 
difference threshold. 

33 The method of claim 30, further comprising the step of 
comparing hybridization intensities of perfect match and mismatch probes 
hybridizing with the sample nucleic acid sequence and hybnchzauon intensib.es of 
perfect match and mismatch probes hybridizing with a baseline sequence to a ratio 
threshold. 

34 The method of claim 30, further comprising the step of utilizing 
a decision matrix to determine the change in expression of the gene in the sample 
nucleic acid. 

35 The method of claim 30, wherein the change in expression of 
the gene in the sample nucleic add is selected from the group consisting of 
increased, marginal increase, decreased, marginal decrease, and no change. 

3 Detailed Description of Invention 

The present invention relates to the field of computer systems. More 
specifically, the present invention relates to computer systems for analyzing 
biological sequences such as nucleic acid sequences. 

Devices and computer systems for forming and using arrays of 
trials on a substrate are Known. For example, PCT application WO92/10588, 
incorporated herein by reference for all purposes, describes techniques for 
sequendng or sequence checking nucleic acids and other material, Arrays for 
performing these operations may be formed in arrays according to the methods of, 
for example, the pioneering techniques disdosed in U.S. Patent No. 5,143,854 and 
US. Patent Application No. 08/249,188, both incorporated herein by reference for 
all purposes. 

According to one aspect of the techniques described therein, an array 
of nudeic add probes is fabricated at known locations on a substrate or chip. A 
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various genes present in the respective virus. Detection of elevated expression 
levels of characteristic viral genes provides an effective diagnostic of the disease 
state. In particular, viruses such as herpes simplex, enter quiescent states for 
periods of time only to.erupt in brief periods of rapid replication. Detection of 
expression levels of characteristic viral genes allows detection of such active 
proliferative (and presumably infective) states. 

SUMMARY OF THE INVENTION 
The present invention provides innovative systems and methods for 
analyzing biological sequences such as nucleic acid sequences. The computer 
system may analyze hybridization intensities indicating hybridization affinity 
between nucleic add probes and a sample nucleic acid sequence in order to call 
bases in the sample sequence. Multiple base calls may be combined to form a 
single base call. Additionally, the computer system may analyze hybridization 
intensities in order to monitor gene expression or the change in gene expression as 

compared to a baseline. 

According to one aspect of the invention, a computer-implemented 
method of calling an unknown base in a sample nucleic add sequence comprises the 
steps on receiving hybridization intensities for a plurality of sets of nucleic add 
probes, each hybridization intensity indicating a hybridization affinity between a 
nucleic add probe and the sample nucleic add sequence; computing a base call for 
the unknown base for each set of probes; and computing a single base call for the 
plurality of sets of probes according to the base call for the unknown base which 
occurs most often for the plurality of sets of probes. Typically, the single base call 
is displayed on a screen display and a user is afforded the opportunity to display or 
not display the base cases from which the single base call is derived. 

According to another aspect of the invention, a method of dynamically 
changing parameters for a computer-implemented base calling procedure comprises 
the steps of: generating base calls for at least a portion of a sample nucleic add 
sequence utilizing the base calling procedure, the base calling procedure ind uding a 
parameter that is changeable by a user; displaying the base calls for the at least a 
portion of a sample nucleic acid sequence; displaying the parameter of the base 
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DESCRIPTION OF PREFERRED EMBODIMENTS 

fieneral 

The present invention provides innovative methods of identifying 
nucleotides (i.e., base calling) in sample nucleic acid sequences and monitoring gene 
expression. In the description that follows, the invention will be described in 
reference to preferred embodiments. However, the .description is provided for 
purposes of illustration and not forHmiung Bie spirit and scope, of the invention. 

Fig. 1 illustrates an example of a computer system that may be used to 
execute software embodiments of the present invention. Fig. 1 shows, a computer 
system 1 which includes a monitor 3, screen 5, cabinet 7, keyboard 9, and, mouse 11. 
Mouse 11 may have one or more buttons such as mouse buttons 13. Cabinet 7 
houses a CD-ROM drive 15 and a hard drive (not shown) that may be utilized to 
store and retrieve software programs including computer code incorporating the 
present invention. Although a CD-ROM 17 is shown as the computer readable 
medium, other computer readable media including floppy disks, DRAM, hard 
drives, flash memory, tape, and the like may be utilized. Cabinet 7 also houses 
familiar computer components (not shown) such as a processor, memory, and the 
like , 

Fig. 2 shows a system block diagram of computer system 1 used to 
execute software embodiments of the present invention. As in Fig. 1, computer 
system 1 includes monitor 3 and keyboard 9. Computer system 1 further includes 
subsystems such as a central processor 50, system memory 52, I/O controller 54, 
display adapter 56, removable disk 58, fixed disk 60, network interface 6Z and 
speaker 64. Removable disk 58 is representative of removable computer readable 
media like floppies, tape, CD-ROM, removable hard drive, flash memory, and the 
like Fixed disk 60 is representative of an internal hard drive or Ihe Itke. Other 
computer systems suitable for use with the present invention may include 
additional or fewer subsystems. For example, another computer system could 
include more than one processor 50 (i.c, a multi-processor system) or memory 
cache. 

Arrows such as 66 represent the system bus architecture of computer 
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running Windows NT including appropriate memory and a CPU as shown in Figs. 
1 and 2. The computer system 100 obtains inputs from a user regarding 
characteristics of a gene of interest, and other inputs regarding the desired features 
of the array. Optionally, the computer system may obtain information regarding a 
specific genefac sequence of interest from an external or internal database 102 such 
as GenBank. The output of the computer system 100 is a set of chip design 
computer files 104 in the form of, for example, a switch matrix, as described in PCT 
application WO 92/10092, and other associated computer files. 

, The chip design files are provided toa. system 106 that designs the 
lithographic masks used in the fabrication of arrays of molecules such as DNA. 
The system or process 106 may include the hardware necessary to manufacture 
masks 110 and also the necessary computer hardware and software 108 necessary to 
lay the mask patterns out on the mask in an efficient manner. As with the other 
features in Fig. 3, such equipment may or may not be located at the same physical 
site, but is shown together. for ease of illustration in Fig. 3. The system 106 
generates masks 110 or other synthesis patterns such as chrome-on-glass masks for 
use in the fabrication of polymer arrays. 

The masks 110, as well as selected information relating to the design of 
the chips from system 100, are used in a synthesis system 112 Synthesis system 
112 includes the necessary hardware and software used to fabricate arrays of 
polymers on a substrate or chip 114. For example, synthesizer 112 includes a light 
source 116 and a chemical flow cell 118 onwhich the substrate or chip 114 is placed. 
Mask 110 is placed between the light source and the substrate/chip, and the two 
are translated relative to each other at appropriate times for deprotection of selected 
regions of the chip. Selected chemical reagents are directed through flow cell 118 
for coupling to deprotecled regions, as well as for washing and other operations. 
All operations are preferably directed by an appropriately programmed computer 
119, which may or may not be the same computer as the computers) used in mask 

design and mask making. 

The substrates fabricated by synthesis system 112 are opLionally diced 
into smaller chips and exposed to marked targets. The targets may or may not be 
complementary to one or more of the molecules on the substrate. The targets are 
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is a probe that will ideally hybridize with the reference sequence and thus a wild- 
type gene (also called the chip wild-type) would ideally hybridize with wild-type 
probes on (he chip. The target sequence is substantially similar to the reference 
sequence except for the presence of mutations, insertions, deletions, and the like. 
The layout implements desired characteristics such as arrangement on the chip that 
permits "reading" of genetic sequence and/or minimization of edge effects, ease of 
synthesis, and the like. 

Fig. 5 illustrates the global layout of a chip. Chip 114 is composed of 
multiple units where each unit may contain different tilings for the wild-type 
sequence or multiple wild-type sequences. Unit 1 is shown in greater detail and 
shows that each unit is composed of multiple cells which are areas on the chip that 
may contain probes. Conceptually, each unit includes multiple sets of related cells. 
As used herein, the term cell refers to a region on a substrate that contains many 
copies of a molecule or molecules (eg., nucleic acid probes). 

Each twit is composed of multiple cells that may be placed in rows (or 
"lanes") and columns. In one embodiment, a set of five related cells includes the 
following: a wild-type cell 220, "mutation" cells 222, and a "blank" cell 224. Cell 
220 contains a wild-type probe that is the complement of a portion of the wild-type 
sequence. Cells 222 contain "mutation" probes for the wild-type sequence. For 
example, if the wild-type probe is 3'-ACGT, the probes 3'-ACAT, 3'-ACCT, 3'-ACGT, 
and 3'-ACTT may be the "mutation' probes. Cell 224 is the "blank" cell because it 
contains no probes (also called the "blank* probe). As the blank cell contains no 
probes, labeled targets should not bind to the chip in this area. Thus, the blank cell 
provides an area that can be used to measure the background intensity. 

Again referring to Fig. 4, at step 206 the masks for the: synthesis are 
designed. At step 208 the software utilizes the mask design and layout 
information to make the DNA or other polymer chips. This software 208 will 
control, among other things, relative translation of a substrate and the mask, the 
flow of desired reagents through a flow cell, the synthesis temperature of the flow 
cell, and other parameters. At step 210, another piece of software is used in 
scanning a chip thus synthesized and exposed to a labeled target The software 
controls the scanning of the chip, and stores the data thus obtained in a file that may* 
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later be ulilized to extract sequence information: 

At step 212 a computer system utilizes the layout information and the 5 <- 
fluorescence information to evaluate the hybridized nucleic add probe* on thechlp 
Among the important pieces of infonnabon obtained from DNA chips are the' 
identification of mutant targets and determination of genetic sequence of a : 
particular (arget v 

Fig. 6 illustrates the binding of a particular target DNA to an array bf 
DMA probes 114. As shown in Wis simple example, the following probes are 
formed in the array (only one probe is sh own for the wild-type probe): 

3-AGAACGT 
AGACCGT 
AGAGCGT 
AGATCGT 



As shown, the set of probes differ by only one base, a single base mismatch at an^ 
interrogation position, so the probes are designed to determine the identity of the 
baseatthatlocationinthenucleicacidsequence. Accordingly, when used hereto a 
umt will refer to multiple sets of relafed probes, where each set includes probes that 
differ by a stogie base mismatch at an interrogation position. 

Whena fluorescein-labeled (or other marked) target with the sequence 
5--TCTTGCA is exposed to; the ah^ it ii complementary only to the prebe 3'- 
AGAACGT, and fluorescein will be primarily found on the surface of the chip 
when? 3'-AGAACGT is located. Thus, for each set of probes that differ by only one 
base, the tmage ftie wtfl contain four fluorescence intensities, one for each probe 
Each fluorescence intensity can thereto* be associated with the nucleotide or base 
of each probe that is different from Softer probes. Additionally, the image file 
wiO contain a .'blank, cell which can be used as the fluorescent intensity of the 
background. By analyzing *e five fluorescence intensities associated with a 
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specific base locatiorv it becomes possible to extract sequence Information from such 
arrays using the methods of the invention disclosed herein. 

Fig. 7 illustrates probes arranged in lanes on a chip. A reference 
sequence (or chip wild-type sequence) is shown with five interrogation positions 
marked with number subscripts. An interrogation position is oftentimes a base 
position In the reference sequence where the target sequence may contain a 
mutation or otherwisejiiffer from the reference sequence. The chip may contain 
five probe cells that correspond to each interrogation position. Each probe cell 
contains a set of probes that have a common base at the interrogation position. For 
example, at the first interrogation position, I„ the reference sequence has a base T; 
The wild-type probe for this interrogation position is 3'-TGAC where the base A in 
the probe is complementary to the base at the interrogation position in the reference 
sequence. 

Similarly, there are four "mutant" probe cells for the first interrogation 
position, \. The four mutant probes are 3'-TGAQ 3-TGCC, 3'-TGGC, and 3 T -TGTC 

Each of the four mutant probes vary by a single base at the interrogation position. 

As shown, the wild-type and mutant probes are arranged in lanes on the chip. 
One of the mutant probes (in this case 3'-TGAQ is identical to the wild-type probe 
and therefore does not evidence a mutation. However, the redundancy gives a 
visual indication of mutations as will be seen in Fig. 8. 

Still referring to Fig. 7, the chip contains wild-type and mutant probes 
for each of the other interrogation positions Irk In each case, the wild-type probe 
is equivalent to one of the mutant probes. j . 

Fig. 8 illustrates a hybridization pattern of a target on a chip with a 
reference sequence as in Fig. 7. The reference sequence is shown along the top of 
the chip for comparison. The chip includes a WT-lane (wild-type), an A-lane, a C- 
iane, a G-lane, and a T-iane (or U). Each lane is a row of ceih containing probes. 
The cells in the WT-lane contain probes that are complementary to the reference 
sequence. The cells in the A-, G-, and T-lanes contain probes that are 
complementary to the reference sequence except that the named base is at the 

interrogation position. 

In one embodiment the hybridization of probes in a cell is determined 
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by the fluorescent intensity (,*., photon counts) of the cell n^ulttog W ^ 
bmchng of marked target sequence, The fluorescent intensity may vary great* 
among cells. Fo , am^ty, Hg . 8 shows . ^ ^ ^ ' & J 

containing a darkened area. T* WT-,ane aUows a simple visual indication that 

- that posino, The eel, to the Olane is darkened which indicates that the 
mutation , 8 f rom T->G (mutant probe ceHs are complementary ,0 the C-cefl 
abdicates a G mutation). In a preferred embodiment, the WT-Lane is not utilized 
so four cells (not indudtog any 'blank- ce,,) are utilized to caD a base at an 
interrogation position. 

h pmcuoe, tk. fluent intern,,,.* of cdk „„ ,„ into™..,,™ 
P-siuon h.^ . m „ Btfo „ « «,. uvel) , dMk ^ , djik 

P^n^, that are perfectly compleatentjiiy to 

^6*«<,.a»re; t»u s , the habrtdiz,^ „f ^ ^ ^ ^ 
^ow„. ^^ P .^re MTCtal ^ ljrfltaCeI , llf|temwillj ^ 
-d 4 aup be rehuvc* Io „ ^ ^ ^ ^ 

- «" ^ .euuence. AM.ugh ,„„„. flu<) re*e ft , intensities reduce the 
re^uKou * „. Maiods „ fc pram , ^ accurate 

W «,u„ e „ iU u„ d „ k ^ ^ . ^ >a . ^ 

other mutations within these regions. 

Fig. 9 fllustrates standard and alternate Wings on a chip. As shown 
o- ctop toctodes twelve units ^ Units, are bled designed and 
syntWd on the chip) to include probes complementary to the same reference 
-quence. For identification purposes, this group of units will be called the 
^ndard g roup. In generaJ , base ca]Js for ^ ^ ^ ^ 

uhhzmg the standard group unless the invention determines that another group or 
groups should be utilized. 

Unfl* are bled to todude probes commentary to the same 
reference seance, but a reference sequence that differs from the reference 
sequence for the standard group. This group of units ^ ^ ^ fln ^ 
S-up. UmK I2 comprises another alternate group that are based on a reference 
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sequence that is different from the reference sequences of the standard and first 
alternate groups. Although the reference sequences are different they are often 
quite similar. For example, the reference sequences may be slightly different 
mutations of HTV. Embodiments of the present invention evaluate and utilize 
information from tilings based on reference sequences that would typically not be 
used in base calling the target sequence. 

The unite within a group may include identical probes, probes of 
different structure, probes from the same or different chips, and the like. For 
example, one unit may include 5-mer probes with the interrogation position at the 
third position in probes. Another unit may include 10-mer probes with an 
interrogation position at the sixth positioa Additionally, these unite may have 
been tiled on the same or different chips. 

The expanded section at the bottom left portion of Fig. 9 illustrates 
that each block of a unit typically includes four cells, denoted A, C, G, and T. The 
base designations specify which base is at the interrogation position of each probe 
within the cell. Typically, there are hundreds or thousands of identical nucleic 
probes within each cell. 

Although in preferred embodiments the cells may be arranged 
adjacent to each other in sequential order along the reference sequence, there is no 
requirement that the cells be in any particular location as long as the location on the 
chip is determinable. Additionally, although it may be beneficial to synthesize the 
different groups on a single chip for consistency of experiments, the methods of the 
present invention may be advantageously utilized with data from different tilings 
on different chips. 

Analyzing Target Faiences 

Fig. 10 shows a screen display of hybridization intensities from a chip. 
During analysis, the system receives an image file including the scanned image of 
the hybridized chip. In a preferred embodiment, the Image file shows fluorescent 
intensities and locations that labeled target nucleic acid sequences or fragments 

bound to the chip. 

A screen display 260 utilizes the common windowing graphical user 
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interface. 'l"he user may select to display the image file for inspection. After the 
oser selects the image file to be displayed, a window 262 is displayed that includes 
the image file. The image file shown includes multiple rows of A-, C-, G-, and T- 



As the user moves the cursor over the displayed image file, a status 
bar 264 indicates the X and Y position of the cursor and the fluorescent intensity at 
that position: Additionally, the use, is able to utilize the pointing de.ice to select a 
octangular an* of the image file in order to manipulate the sub^image For 
example, the user may magnify the subimage so thai the individual cells may be 
seen more clearly. Additionally, fee user may adjust the contrast of the intensities 
to bnng to light ,ome difference, in hybridization intensity mat is not apparent at 
the current contrast setting. 

Fig. 11 is a flowchart of a process of computing a base call from 
hybridization intensities of related probes: When used hemin, "related probes' are 
probes that differ by a nucleotide base at an interrogation position. Although 
typically the probe, are identical except at the interrogation position, the probed 
may differ at other base positions as well Accordingly, the related probes differ 
by at least one base. 

At step 302 the hybridization intensities of the four related probe, are 
adjusted by subtracting the background or "blank- cell intensity. Preferably if a 
hybridization intensity is then .ess than or equal to zero, the hybridization intensity 
is set equal to a small positive number to prevent division by ,™T or negative 
numbers in future calculations. 

AtstepSM, me hybridization intense The 
highest intensity is then compared to a predetermined background difference cutoff 
at step 306. The background difference cutoff is a number that specifies the 
hybridization intensity the highest intensity probe must be over the background 
mtensity in order to correctiy call the unknown base. Thus, the background 
adjusted base intensity must be greater than the background difference cutoff or the 
unknown base is deemed to be not accurately callable. 

If the highest hybridization intensity of the related probes is not 
greater than the background difference cutoff, the unknown base is assigned the 
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code 'N- (insufficient intensity) as shown at step 308. Otherwise, the ratio of the 
highest hybridization intensity and second highest hybridization intensity is 
calculated as shown at step 310. 

At step 312, the ratio calculated at step 310 is compared to a 
predetermined ratio cutoff. The ratio cutoff is a number that specifies the ratio 
required to identify the unknown base. In preferred embodiments, the ratio cutoff 
if 1.2. If the ratio is greater than the ratio cutoff, the unknown base is called 
according to the probe with the highest hybridization intensity. Typically, the base 
is called as the complement of the base at the interrogation position in the highest 
intensity probe as shown at step 314. Otherwise, the ratio of the second highest 
hybridization intensity and third highest hybridization intensity is cakulated as 

shown at step 316. 

At step 318, the ratio cakulated at step 316 is compared to the ratio 
cutoff. If the ratio is greater lhan the ratio cutoff, the unknown base is called as 
being an ambiguity code specifying the complements of interrogation position bases 
of the highest hybridization intensity ptobe and the second highest hybridization 
probe as shown at step 320. Otherwise, the ratio of the third highest hybridization 
intensity and fourth highest hybridization intensity is calculated as shown at step 
322. . 

At step 324, the ratio calculated at step 322 is compared to the ratio 
cutoff. If the ratio is greater than the ratio cutoff, the unknown base is called as 
being an ambiguity code specifying the complements of interrogation position bases 
of the highest, second highest and third highest hybridization intensity probes as 
shown at step 326. Otherwise, the unknown base is assigned the code 'X' 
(insufficient discrimination) as shown at step 328. 

Fig. 12 is a flowchart of another process of computing a base call from 
hybridization intensities of related probes. The flowchart shown operates on 
hybridization intensities demonstrated by related probes; thus, a base call is made 
for «he base in the target corresponding to the interrogation position in probes that 
differ by a single base mismatch at the interrogation position. At step 402, the 
system determines if there is one probe with the highest hybridization to the target 
sequence. If there is not, the base is called as an 'N' meaning ambiguous. For 
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example; if two probe, have the same highest Intensity mere is a fe) , the base 
would be called as'N*. 

If there b a single pxobe that has the highesl hybridization to fteta^et 
Ae base is cal.ed according to that probe at step 406. Since the probes are 
complementary to the target sequence, the base may be called as the complementary 
base (C/G,A/T) to the base at the interrogation position of the probe. 

At step 408, the system determines if the base call is a mutant 
meaning it is different than the base in the reference sequence. If the base call b 
not a mutant base call, the base call has been made. Otherwise, the svstem 
determines checks to make sure certain Want" conditions a* metat step 410 or 
the base is called as W at step 41Z 

Before describing the mutant condition, for one embodiment, it may 
be beneficial to give labefcrto the hybridation intensities of the related probes 
For lustration purpo.es "Highlnc' will refer to the highest hybridization intensity 
"Secondlnf wiU refer to the second highest hybridization intensity 'Thudlnt" will 
«fer to the third highest hybridization intensity, and Wine' will refer to the 
lowest highest hybridization intensity. 

In one embodiment the mutant conditions include three tests that 
must al. be met to can the base a mutant A fhst test is whether the different 
between Highlnt and Seconds is greater than a difference cutoff. Thus, the 
system determines if Highlnt - Seamdlnt is greater than a predefined value This- 
value should be chosen to allow mutant base calls only when the highest 
hybridization intensity is greater than the next highest hybridization intensity by a 
desired amount. 

A second test b whether a first ratio is less than a first ra tio cutoff. 
The first ratio is the following: 

SeconriTnfr - sr 1 rtrn.tW| T nr * l nr Tnf) 

Highlnt - sqrt(ThirdInt - Lo wlnt) 
The system determines if this first ratio b ]ess than a predefined value. This value 
should be chosen to allow mutant base calls only when the highest hybridization 
"tensity b a desire ratio gfeater than the next highest hybridization intensitv even 
after the lowest two hybridization intensities are subtracted out 
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A third test is whether a neighbor ratio is greater than a neighbor ratio 
cutoff. The neighbor ratio is the following: 

Highlni, 

Highln^ - sqrt(HighInt„ rt * Highln^ 
where the subscript n designates values for the base position that is being called and 
n+1 and n-1 represent values for adjacent base positions. Thus, the system 
determines if the neighbor ratio is greater than a predefined value. This value 
should be chosen to allow mutant base calls only when the highest hybridization 
intensity is a desired ratio greater than the highest hybridization intensity with the 
adjacent highest hybridization intensities subtracted out 

Accordingly, in a preferred embodiment only if all of the mutant 
conditions are met will the base be called a mutant base. This embodiment 
agnizes that mutations are fairly rare so a mutant base should only be called 
when there is a high likelihood that there has been a mutation. If the mutant 
conditions are not met, the base may be called as ambiguous or as the same as the 
reference sequence (which statistically may be the correct base call). 

Although a preferred embodiment utilizes three mutant conditions, 
other embodiments mav use a single mutant condition (e.g., one of the conditions 
described above). Other embodiments may utilize other base calling methods 
including the ones described in the U.S. Patent Applications previously 

incorporated by reference. 

Fig. 13 is a flowchart of a process of calling bases in a group of units. 
As indicated earlier, a unit includes multiple sets of related cells, where the related 
cells include probes that differ by a single base at an interrogation position. In a 
typical embodiment, the system initially receives input on the hybridization 
intensities (e.g., from the image data file produced by a scanner that scans the 
hybridized chip) and the structure of the probes that correspond to the 
hybridization intensities, m preferred embodiments, the background intensity (e.g., 
intensity measured fn>m "blank" cells or other areas of the chip without probes) are 
subtracted from the measured hybridization intensities. The background 
subtracted hybridization intensities may also be limited to have a minimum 
hybridization intensity of 1 (e.g„ one photon count). 
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The hybridation in tensity describes the extent of hybridization that 
febeled tecs* sequences lhal tamd to pn)ta ta 

Therefore, the hybridization intensities foi the related- cells of enchunjt at thehase 
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V- three unib 



f G'- one unit 
'N' - one unit 
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reference sequences; however, this does not mean that all of their hybridization 
information may not be utilized. Typically, it is assumed that the reference 
sequence for the standard group is expected to be the most identical to the target 
sequence. However, if one of the alternate groups is determined to be more 
identical .(if, better for making a base call), then that group will be used to make the 
base call. 

At step 502, the system computes base calls in the units of the 
standard and alternate groups. The base calling may be done as was described in 

reference to Fig. 13. 

The system then computes a base call for each group of units at step 
504. This may be accomplished by determining the base thai is called most often 
by the units. Alternatively, the base call for the group may be determined utilizing 
the process which will be described in more detail in reference to Fig. 15. 

After the system has determined a base call for each group of units 
(both the standard and alternate tilings), the system identifies a base position at step 
506. The system then determines the best group of units for this base position to be 
utilized to make the base call. In general, selecting the best group may involve 
determining which reference sequence of the groups has the fewest mismatches 
with the target sequence near or in a window around the interrogation position. 
The group of units that has the fewest mismatches near the interrogation position 
may have the highest likelihood of producing the most accurate base call. An 
embodiment of selecting the best group will be described in more detail In reference 
to Fig. 16. 

At step 510, the system calls the base at the identified base position 
according to the best group of units (te., utilizing the base call for the group that 
was computed at step 504). Once the base call has been made the system 
determines if there is a next base position to perform a base calk If there is another 
base position to be called, the system proceeds to call that base position at step 506. 

Fig,15 is a flowchart of a process of calling a base for a group of units. 
At step 602, the system determines if a majority of units call the same base at the 
specified base position. The majority is determined upon reference to only those 
units that call a base (e.g., do not call as ambiguous or <N ). For example, assume 
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thatthere -seven unit, and U« fonowmg base calls Kave been made for the units: 
'G' - three unite 
T-one unit 
W-fourumfe 

■** out of four of .he nonambiguous base ^ „ , G , me ^ ^ 
"ubauv «J, the base as a 'G' for the group of units. The base wu. be auled as the 
majority base unless an exception rule appfe at5tep m . 

u 1116 »P«iiy condiUons which dictate what base call 

should be made for the groU p of units. These rules may include condl(iom ^ . 
change a majority base cal. and may include conditions to deal with situations when 
^ ahaSeCalllhata °< -* "1L In a preferred embodiment, the 

of ne lg hboring probes (,*. one unit calls one base and another unit calls a different 
Base). Additionally, mc rutes ^ ^ , ^ ^ 

bases with one of the calls being for the reference base, the system should can the 

base as the reference for thegroup of unit,. Other exception rules are described in 
tne Appendix. 

AtstepaOo.fcsystemdete™^^^ ffan 
excepbon rule does apply, the ruleis applied atstep 608. 

Fig. 16 i, a flowchart of a process of selecting a best group of urdts for 
Performing a base cal,. Seiecung the best group involves determining which 
reference sequence of the groups has the fewest mismatches with the target 
sequence near the interrogation position. The group of units that has the fewest 
-matches near the interrogation position may have the highest likelihood of 
Producing the most aerate base call. The window around the interrogation 
77 7^ ^ ^ <* • * value o, set according to the pro* 

extend from the interrogation position is eight base positions to one side of the 
.nterrogabon position md ten base posihons to the other side of the interrogahon 
pos.uon, the window may besetas including this range of base positions. 

At step 70* the system calculates mismatch scores for the standard 
and aitemate groups of units. The mismatch score .is an indication of how many 
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mismatches a reference sequence appears to have with the target sequence. In 
order to determine a mismatch score, the system may only analyze base positions 
where at least two of the reference sequences differ. Thus, if all the reference 
sequences are identical at a base position, this base position may be skipped. 

At each base position where at least two reference sequences differ, 
the system determines if the base call for a group (the base call indicating the likely 
base in the target sequence) at each of these positions differs from the corresponding 
base of, the reference sequence. If the base call and the base for the reference 
sequence differ^ the mismatch score is incremented by one. Initially, the mismatch 
scores for each group is set to zero. 

Conceptually, it should be understood that the mismatch score is an 
indication of the number of base positions in a portion of the reference sequence 
that differ from the target sequence (optionally excluding those positions where all 
the reference sequences are the same). To better illustrate this concept the 
following simple example is presented. Assume there is a standard group and two 
alternate groups as follows: 

fftand""* firoup Mismatch Score 

reference ACGGATGAGATACGA 1 

base calls ACTGATGAGATACGA 

reference ACTGATGAGATACGA 0 

base calls ACTGATGAGATACGA 

Altrrr"^™"!* 7 Mtsmatrti Score 

reference ACGGATGAGATACGT 2 

base calls ACTGATGAGATACGA 
The underlined bases correspond to the base position which is being analyzed. 
The bolded base positions indicate base positions where at least two of the reference 
sequences differ. At these bolded base positions, the standard group has one base 
position where the reference sequence differs from the target sequence (as indicated 
by the base calls) so the mismatch score is 1. Similarly, the first alternate group has 
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a mismatch score of 0 and the second alternate group haj . mi5mflltn m ^ . 

As alternate group ! has the lowest mismatch score; that group would 
be utilized to call the base at the base position being analyzed. In this simple 

Tf ^ 1,356 ^ ^ ^ *' "* * 8^ a, this example is 
mferded fo ^ 8-oup may be setected. However, what is 

-portant is that the invention rea.gnfc* that the more mismatches that occur near 
a base position; the less accurate the base call will become. This result * brought 
"Pon by the fact that a mismatch between the reference sequence and the target • 
sequence creates any area where the probes interrogating neighboring base 
posiuons include a single base mismatch. Single base mismatches lower the 
hybridization intensity and may produce inaccurate results. 

At step 704, the system determines if a mismatch score of die standard 
groups is less than or equal to the mismatch scores of alternate groups If the 
standard group has the lowest mismatch scone (or ties), then the base ca» performed 
according to the standard group. 

The system determines if a single alternate group has the lowest 
mismatch score at step 708. U so, that alternate group is UoJized to make the base 
callatstepTlO. Others. * more ^ _ ^ ^ ^ ' 
sametnismatehscores. *™is^**^ &oupn ^^^ 
includes units that most consistent* called the base at step 712. For example, if 
two alternate group, ha ve the same lowest mismatch «o re but one group's units aU 
called the same base and the other group's units were split the a.ternate group mat 
called the same base would be utilized. Other methods of determining the best 
group m the event of a mismatch score tie may also be utilized. 

Fig. 17A shows a screen displays allowing analysis of nucleotides from 
e>periments from one or more chips. A screen display 802 includes multipfe 
screen areas that display different information to the user. A screen area 804 

are antisense regions (Protease Revere Transcriptase, of the fflV virus. The 
reference sequence is typically used as a baseline with which to compare sample 
fences. Although the refers sequence on the screen may be the chip wUd- 
type sequence for whtch the chips wer* uled, there is no requirement that this is the 
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A screen area 806 includes the nucleotide sequence for the reference 
sequence for the probe array. The base position of each nucleotide fe shown above 
screen area 806. Screen area 806 also sohows the reference sequence for each unit if 
"expanded" in the user interface. 

A screen area 808 shows the user the chip and composite files that are 
currently being analyzed. A chip file (e.g., ends in ".CHF) includes data obtained 
from a single chip. A composite file (e.g., ends in ".CMP") includes data obtained 
from more than one chip. When a user opens a chip or composite file for analysis, 
the pathname of the file is displayed in screen area 808. 

Information from the chip and composite files may be displayed m 
screen areas 810 and 812. Screen area 810 includes the names of sample sequences 
currently being analysed from the chip or composite files. The name of the sample 
sequence is typically chosen to enable the user to readily determine the what the 
sample sequence represents. Screen area 812 includes the nucleotide sequence for 
the sample sequences. The base position of each nucleotide i« screen area is the 
same as indicated above screen area 806. Accordingly, the system automatically 
aligns the reference and sample sequences for easier analysis. 

Fig. 17A has been described in order to familiarize the reader with the 
layout of the screen display. However, as illustrated by Fig. 17B, the invention 
allows the user to hide (not display) and summarize infoimation from chip and 
composite files. For example, if a user "clicks on" or activates the screen icon plus 
sign in front of the composite filename in screen area 808, the system displays more 
information about the composite file. As shown, the method that was utilized to 
combine the information from the chip files may be shown along with the 

individual chip files. 

Additionally, if a user activates the screen icon plus sign in front of the 
chip filename in screen area 808, the system displays more information about the 
chip file including the process or procedure that was utilized to calls bases. In Fig. 
17B, the base calling procedure was the "Ratio Base Algorithm" which was 
described in reference to Fig. 10. Additionally, the user is able to modify 
parameters for the base calling procedure which will be immediately reflected in the 
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a particular target sequence. For each mismatch (MM) control in a high-density 
array there typically exists a corresponding^ perfect match (PM) probe that is 
perfectly complementary to the same particular target sequence. 

The process compares hybridization intensities of pairs of perfect 
match and mismatch probes that are preferably covalently attached to the surface of 
a substrate or chip. Most preferably, the nucleic acid probes have a density greater 
than about 60 different nuclek arid probes per 1 cm' of the substrate. Although 
the flowcharts show a sequence of steps for clarity, this is not an indication that the 
steps must be performed in this specific order. One of ordinary skill in the art 
would readily recognize that many of the steps may be reordered, combined, and 
deleted without departing from the invention. 

Initially, nucleic acid probes are selected that are complementary to 
the target sequence (or gene). These probes are the perfect match probes. 
Another set of probes is specified that are intended .to be not perfectly 
complementary to the target sequence. These probes are the mismatch probes and 
each mismatch probe includes at least one nucleotide mismatch from a perfect 
match probe. Accordingly, a mismatch probe and the perfect match probe from 
which it was derived make up a pair of probes. As mentioned earlier, the 
nucleotide mismatch is preferably near the center of the mismatch probe. 

The probe lengths of the perfect match probes are typically chosen to 
exhibit high hybridization affinity with the target sequence. For example, the 
nucleic acid probes may be all 20-mere. However, probes of varying lengths may 
also be synthesized on the substrate for any number of reasons including resolving 
ambiguities. 

The target sequence is typically fragmented, labeled and exposed to a 
substrate including the nucleic add probes as described earlier. The hybridization 
intensities of the nucleic acid probes is then measured and input into a computer 
system. The computer system may be the same system that directs the substrate 
hybridization or it may be a different system altogether. Of course, any computer 
system for use with the invention should have available other details of the 
experiment including possibly the gene name, gene sequence, probe sequences, 
probe locations on the substrate, and the like. 
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Al step 958, ihe hybridization intensities of Ihe pair of probes are 
compared to a difference threshold (D) and a ratio threshold (R). It is determined 
if the difference between Ihe hybridization intensities of the pair (1^ - U) is greater 
than or equal to the difference threshold AND the quotient of the hybridization 
intensities of the pair (1^ / I.J is greater than or equal to the ratio threshold. The 
difference thresholds are typically user denned values that have been determined to 
produce accurate expression monitoring of a gene or genes. In one embodiment 
the difference threshold is 20 and the ratio threshold is 1.2 

If V - U. >= D and lp« / U< >= value NPOS ls incremented at 

step 960. In general NPOS is a value that indicates the number of pairs of probes 
which have hybridization intensities indicating that the gene is likely expressed. 
NPOS is utilized in a determination of the expression of the gene. 

At step 962 it is determined if U " V» >= D and V- / h» >= R H 
this expression is true, the value NNEG is incremented at step 964 In general 
NNEG is a value that indicates the number of pains of probes which have 
hybridization intensities indicating that the gene is likely not expressed. NNEG, 
tike NPOS, is utilized in a determination of the expression of the gene. 

For each pair that exhibits hybridization intensities either indicating 
the gene is expressed or not expressed, a log ratio value (T*> «nd intensity 
difference value (IDIT) are calculated at step 966. LR is calculated by the log of the 
quotient of the hybridization intensities of the pair <I,„ / I„J. The IDIF is 
calculated by the difference between the hybridization intensities of the pair (1^ - 
I^J. If there is a next pair of hybridization intensities at step 968, they are 

retrieved at step 954. 

At step 972, a decision matrix is utilized to indicate if the gene is 
expressed. The decision matrix utilizes the values N, NPOS, NNEG, and LR 
(multiple LRs). The following four assignments are performed: 

P1 = NPOS / NNEG 

P2=NPOS/N 

P3 = (10 * SUMfLR)) / (NPOS + NNEG) 
These P values are then utilized to determine if the gene is expressed. 

For purposes of illustration, the P values are broken down into ranges. 
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following; 
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Q«(R3>- 15) 
R = (1^>P3>-11) 
S = (P3<1,1) 
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Absent All others cases {e.g., any C combination) 

In the output to the user, present may be indicated as "P," marginal as "M" and 

absent as "A" at step 974. 

Once all the pairs of probes have been processed and the expression of 
the gene indicated, an average of ten tunes the LKs is computed at step 975. 
Additionally, an average of the IDDF values for the probes that incremented NPOS 
and NNEG is calculated, which may be utilized as an expression level. These 
values may be utilized for quantitative comparisons of this experiments with other 

experiments. ( 

Quantitative measurements may be performed at step 976. For 
example, the current experiment may be compared to a previous experiment {e.g., 
utilizing values calculated at step 970). Additionally, the experiment may be 
compared to hybridization intensities of RNA (such as from bacteria) present in the 
biological sample in a known quantity. In this manner, one may verify the 
correctness of the gene expression indication or call, modify threshold values, or 
perform any number of modifications of the preceding. 

For simplicity, Fig. 19 was described in, reference to a single gene. 
However, the process may be utilized on multiple genes in a biological sample. 
Therefore, any discussion of the analysis of a single gene is not an indication that 
Ihe process may not be extended to processing multiple genes. 

Fig. 20 shows a screen display layout of gene expression monitoring 
software. A screen display 1000 is divided into two sections: a graphics display 
area 1002 and a data display area 1004. The graphics display area is for displaying 
graphs which will aid the user in interpreting the data. The data display area is for 
displaying the underlying data so the user may evaluate the underlying data for 
gene expression. 

As will be shown in subsequent screen displays, the data display area 
is preferably organized in a table having rows and columns. Each column has a 
heading indicating the data that resides in the column. Each row represents data 
from a single experiment or combination of experiments for a gene. The term 
"experiment" is used herein to describe a process that created data. For example, a 
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Additionally, the user is able to sort the data in the display data are according to 

values in a selected column. 

Fig. 22 shows another screen display illustrating the analysis of a 
selected gene. A screen display 1060 includes a graphics display area 1062 
illustrating a graph of the ratio of the hybridization intensity of the perfect match 
probe to the mismatch probe at each base position. The x-axis is the base position 
and the y-axis is toe ratio of hybridization intensities. The statistical ratio 
threshold is plotted on the graph, which in this example is 1.2. this graph may be 
utilized by the user to analyze how many probe pairs ( VU) are above or M ™ 
the threshold. The graph also includes the gene and experiment names. 

Rg. 23 shows a screen display illustrating Ihe comparison of 
experiments for selected genes! A screen display 1160 includes a graphics display 
area 1062 and a data display are 1164. The graphics display area include a graph 
of the ratio of the hybridization intensity of Ihe perfect match probe to the mismatch 
probe at each base position for each of the experiments/ genes selected in the data 
display area. In a preferred embodiment, the experiment name, gene name, and 
data plot are a different color for each gene to allow the user to more easily see the 
differences between or among selected genes. 

Fig. 24 shows another screen display illustrating the comparison of 
experiments for selected genes. A screen display 1200 includes a graphics display 
area 1202 illustrating the expression levels of genes selected in a data display area 
1204. The graph of the expression levels of the selected genes is a bar graph. In a 
preferred embodiment, the expression level is denned as the average intensity 
difference {see aver ag e(TDIF) in Fig. 19). The graph also includes the gene and 

experiment names. 

Fig. 25 shows another screen display illustrating the comparison of 
experiments for selected genes with multiple graphs in the graphics display area. 
A screen display 1230 includes a graphics display area 1232 depicting multiple 
graphs for analyzing the genes selected in a date display area 1234. An expression 
level graph 1236, an average intensity difference graph 1238 and a hybridization 
intensity graph 1240 are shown for the selected genes. 

Figs. 26A and 26B show the flow of a process of determining the 
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the value N1NC is incremented at step 1314. In general NINC is a value thai 
indicates the experimental pair of probes indicates that the gene expression is likely 
greater (or increased) than the baseline sample. NINC is utilized in a 
determination of whether the expression of the gene is greater (or increased), less 
(or decreased) or did not change in the experimental sample compared to the 
baseline sample. 

At step 1316, it is determined if (U - I.J "Or- " U) >= DDIF and (J,„ 
-J )/(Ip /U)>= RDIF - If this expression is true, NDEC is incremented. In 
general, NDEC is a value that indicates the experimental pair of probes indicates 
mat the gene expression is likely less (or decreased) than the baseline sample. 
NDEC is utilized in a determination of whether the expression of the gene is greater 
(or increased), less (or decreased) or did not change in the experimental sample 

compared to the baseline sample. 

For each of the pairs that exhibits hybridization intensities afher 
indicating the gene is expressed more or less in the experimental sample, the values 
NPOS, NNEG and LR are calculated for each pair of probes. These values are 
calculated as discussed above in reference to Fig. 19. A suffix of either »B« or V 
has been added to each value in order to indicate if the value denotes the baseline 
sample or the experimental sample, respectively. If there are next pairs of 
hybridization intensities at step 132* they are processed in a similar manners 
shown. 

Referring now to Fig. 26B, an absolute decision computation ,s 
performed for both the baseline and experimental samples at step 1324. The 
absolute decision computation is an indication of whether the gene is expressed, 
^nalorabsentineachof thebaseline and experimental sampks. Accordingly, 
in a preferred embodiment, this step entails performing steps 972 and 974 from F,g. 
19 for each of the samples. This being done, there is an indication of gene 
expression for each of the samples taken alone. 

At step 1326, a decision, matrix is utilized to determine the difference 
in gene expression between the two sample, This decision matrix utilizes the 
values, N, NFC6B, NPOSE, NNEGB, NNEGE, NINC, NDEC, LRB, and LRE as they 
were calculated above. The decision matrix performs different calculations 
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depending on whether NINC is ei™w iw 

greater than or equal to NDEG Hie calculations 

P1 = NINC/KDEC 
P2 = NJNC/N 

P3 = ((NPOSE - NPOSB) - (NNEG E . NNEGB)) , N 

P4 = 10-SUM(LRE-LRB)/N 
These P val ues are ^ u(ilj2ea 

betwe«,thetwosa m pl e , ^ " ^ ' 

acceding to ^ ^ " " *" ' ^ * ^ 
A = (P1>=28) 
B = (18>P1>= 20) 
C = (P1<Z0) 

X = (P2 >= 0.34) 

Y = (0.34 >P2>= 0.24) 

2"(P2<0.24) 

M = (P3>=0.20) 

N = (0.20 >P3>= 0.12) 
O = (P3<0.12) 

Q = (P4>=0.9) 

R = (0.9>P4>=0.5) 
S=(P4<0.5) 

v^r 68 ^ d ° Wn ' nto Boolean 1 

In this case when? NINC >= NDEC, Ihe Eene exnro v 
indicated as inched • i • «p ress ,on change is 

-eased, margInal mcrease or ^ ^ ^ ^ ^ 
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summary of the gene expression indications: 

Increased A and (X or Y) and (Q or R) and (M or N or O) 

A and (X or Y) and (Q or R or S) and (M or N) 
B and (X or Y) and (Q or R) and (M or N) 
A and X and (Q or R or S) and (M or N or O) 



Marginal AorYorSorO 
Increase B and (X or Y) and (Q or R) and O 

B and (X or Y) and S and (M or N) 
C and (X or Y) and (Q or R) and (M or N) 

No Change All others cases (e.g., any Z combination) 

In the output to the user, increased may be indicated as V marginal increase as 

"Ml" and no change as "NC." 

If NINC < NDEQ the following four P values are determined: 

PI = NDEC / NINC 

J 

P2 = NDEC/N 

P3 = ((NNEGE - NNEGB) - (NPOSE - KPOSB)) / N 
P4 = 10 * SUM(I,FF. - LRB) / N 

These P values are then utilized to determine the difference in gene expression 

between the two samples. 

The P values are broken down into the same ranges as for the other 
case where NINC >= NDEC. Thus, P values to this case indicate the same ranges 
and will not be repeated for the sake of brevity. However, the ranges generally 
indicate different changes in the gene expression between the two samples as 
shown below. 

In this case where NINC < NDEC, the gene expression change is 
indicated as decreased, marginal decrease or no change. The following is a 
summaiy of the gene expression indications: 
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D^ed Aand(XorY)and(QorR)and(MorNorO) 
A and (X or Y) and (Q or R or S) and (M or N) 
B and (X or Y) and (Q or R) and (M or N) 
A and X and (Q or R or S) and (M or N or O) 

Margmal AorYorSorO 

^ >eaeaSe Ba "d(XorY)and(QorR)andO 
B and (X or Y) and S and (M or N) 
C and (X or Y) and (Q or R) and (M or N) 

No Change All often, cases (e.g., any Z combination) 

Me output * the user , detTOased ^ fe indicated ^ „ d/ ^ ^ 

ml> and no change as "NC" 

ox Prcssi 8bOVG ^ ,h3t re,abVe di «— Ween , hc gene 

determmed. An additional test may be performed that would change an L MI D 
ttf. from step 1324) and the following expressions are all true: 

Avcrage(IDIFB) >= 200 
Average(IDIFE) >= 200 

1.4 >= Average(IDIFE) / AvemgeflDIFB) 0.7 

« The IDIFB and ID1FE are c^.„ is SM1 * .„ „„ ^ for 

each sample divided by N. 

^ At st^ 1328 , va]ues ^ quanatativc <a|fcMM ^ 
^ulated. An average of (ft. - J„J - ft n . U)) for each rf ^ pafrs fc 
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Additionally, a quotient of the average of - L„ and the average of V - U •» 
calculated. These values may be utilized to compare the results with other, 

experiments in step 1330. 

Fig. 27 A shows a screen display illustrating the monitoring of the 
change of gene expression between experiments. A screen display 1400 includes a 
graphics display area 1402 and a data display area 1404. A user begins the 
comparison of experiments for a gene by selecting two experiments for a gene. For 
simplicity, we will call one baseline data and the other experimental data, meaning 
it will be compared to the baseline. For example, a user may select two 
experiments for the gene with the name «gl82506." A comparison of two 
experiments is an experiment itself so the user is able to enter an experiment name 
W hich was entered as 'too' in the data display area of Fig. 27A. Fig. 27D shows 
another screen display illustrating monitoring, of the change of gene expression 

between experiments. 

The system then determines the change in gene expression between 
the selected experiments according to me process described in Figs. 28A and 28B. 
The data display area includes columns denoting the data produced by this 
comparison. The Experiment Name refers to a user-defined name for the 
comparison experiment. The Gene Name is the name of the gene. The numbers 
Inc and Dec refer to the values NINC and NDEC as described in reference to Fig. 
26 A Mm* spedfically, Inc refers to the number of base positions in the gene for 
which the difference and ratio of the perfect match and mismatch hybridization 
intensities are significantly greater in the experimental data. 

The Inc Ratio column indicates the number of base positions where the 
hybridization intensity increased divided by the total number of base positions in 
the gene which are analyzed. The Dec Ratio column indicates the number of base 
positions where the hybridization intensity decreased divided by the total number 
of base positions in the gene which are analyzed. The Pos Change column 
indicates the difference in the number of positive scoring probe pairs in the 
experimental data versus the baseline date. The Neg Change column indicates the 
difference in the number of negative scoring probe pairs (perfect match and 
"mismatch) in the experimental data versus the baseline data. 
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thereon, such as RNA. The scope of the invention should, therefore, be determined 
not with reference to the above description, but instead should be determined with 
reference to the appended claims along with their full scope of equivalents. 

4 Brief Description of Drawings 

Fig.1 illustratesanexampleof a computer system that may be used to 

execute software embodiments of the present invention; 

Fig. 2 shows a system block diagram of a typical computer system; 
Fig. 3 illustrates an overall system for forming and analyzing arrays of 

biological materials such as DNA or RNA; 

Fig. 4 is an illustration of an embodiment of software for the overall 

system; 

Fig. 5 illustrates the global layout of a chip formed in the overall 

system; 

Fig. 6 illustrates conceptually the binding of nucleic acid probes on 

chips to a labeled target; 

Fig. 7 illustrates nucleic acid probes arranged in lanes on a chip; 

Fig. 8 illustrates a hybridization pattern of a target on a chip with a 

reference sequence as in Fig. 7; 

Fig. 9 illustrates standard and alternate tilings; 

Fig. 10 shows a screen display of hybridization intensities from a chip; 

Fig. 11 is a flowchart of a process of computing a base call from 
hybridization intensities of Telated probes; 

Fig. 12 is a flowchart of another process of computing a base call from 

hybridization intensities of related probes; 

Fig. 13 is a flowchart of a process of calling bases in a group of unite; 
Fig. 14 is a flowchart of a process of calling bases for multiple groups 

of units; 

Fig. 15 is a flowchart of a process of calling a base for a group of units; 
Fig. 16 is a flowchart of a process of selecting a best group of units for 

performing a base call; 

Figs. 17A and 17B show screen displays allowing analysis of 
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nucleotides from experiments from one or more chips; 

exores • W Sh ° WS 4 ^ ' eVel fl ° WChart °' 3 * —ring me 

exp^on of a gene by comparing hybridization ^ 

match and mismatch probes; or perfect 

Fig. 19 shows a flowchart of a process of determining if a gene is 
expressed utilizing a decision matrix; 8 

software; ^ " ^ 8 ^ ^ ^ °' ~« — % 
Elected g ene R8S " ^ ^ " ^ ^ ^ ** — - - • 

-ectedgene;^ " ^ ^ ^ ~ 8 fc «** - * 

Fig. 23 shows a screen display illustrating the comparison of 
expenments for selected genes; 

Fig. 24 shows another screen display illustrating the comparison of 
expenments for selected genes; 

exn • . Z i8 ' ^ Sh<>WS anOUler SCTeen diSplay iUustrann 6 the comparison of 
exores , ^ ^ ^ 8 **** * 3 °< *e 

the cha f ^ ^ ^ SW Xteen ^ *• -nitoring of 

the change of gene expression between experiments; and 

Fig. 28 shows a screen display iHustratmg a threeKlimensional bar 
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